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(57) ABSTRACT 

In a computer implemented method on-address-bookristzr? 
d ynami cally generated "in a distribute d.maUservice^systeinT^ 
The distributed mail service system includes a plurality of 
client computers connected to a mail service system via a 
network. Mail:messages:are:stored:in: m essa g e-files -of - the 
mail service system. Each~mail~messageiisrparsed and 
m^xe d^to-generate-a-fuU^ext-ind exrofimermlil-service^ 
system. A n-address-book-mail-message-is-generated r each^ 
a ddress-book-mail-message including address-information ? 
The address book mail messages are stored in the message 
files, and parsed and indexed into the full-text index file. A 
query is composed using a particular one of the plurality of 
client computer systems to search the full- text index to 
locate and retrieve selected ones of the address book mail 
messages as the dynamic address book. 

5 Claims, 10 Drawing Sheets 
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1 2 

TECHNIQUE FOR DYNAMICALLY ere connected to a mail service system via a network. Mail 

GENERATING AN ADDRESS BOOK IN A messages are stored in message files of the mail service 

DISTRIBUTED ELECTRONIC MAIL system. ^Theimaitmessages^are^paisedraiid^mdexed to 

SYSTEM generate a full-text index of the mail service system. Address} 

5 .book.mail messages are-generated, each-address book : maifo 

FIELD OF THE INVENTION messag ^includes^dbVess^infoTrr^tionp 

Tne present invention relates generally to electronic mail, The address book mail messagesaTTstored in the message 

and more particularly to electronic mail messaging in a fil f • T ? e ^dress book mad messages are also parsed and 

distributed computer system. in m< ? exed mt0 m ? m ' [exi c ^ex file A query is composed 

10 using a particular one of the plurahty or client computer 

BACKGROUND OF THE INVENTION systems to search the full-text index to locate and retrieve 

selected ones of the address book mail messages as the 

With the advent of large scale distributed computer sys- dynamic address book, 

terns such as the Internet, the amount of information which ^ information can ^ generated using a form 

has become avadable to users of computer systems _has 15 Hed b client mail applicalioD programs executing on 

exploded. Among this information is electronic maiP ^ arlicular dienl mmpa1CT , The client mail application 

(e-mail). With the improvements in means for composing ms ^ down .i oade d to the particular client computer 

and distributing written messages, the amount of e-mail y . ^ 
traffic on the Internet has surged. It is not unusual for an 

active Internet user to be exposed to tens of thousands of 2 o BRIEF DESCRIPTION OF THE DRAWINGS 

e-mail messages a year. FIG. 1 is a block diagram of an arrangement of a 

As an advantage, the Internet allows users to interchange distributed mail service system which uses the invention; 

useful information in a timely and convenient manner. FIG. 2 is a block diagram of a mail service system of the 

However, keeping track of this huge amount of information arrangement of FIG. 1; 

has become a problem. As an additional advantage, the 25 e „ . 

" £ . ■ * - HG. 3 is a block diagram of an account manager and 

Internet now allows users to exchange information in a _ , - * i£3m r c , r & 

- . j u t . account records ot the system ot HO. 2; 

number of different presentation modalities, such as text, * 

audio, and still and moving images. Adapting e-mail systems FIG. 4 15 a biock dl *&* m of messa S e and lo S files 

to organize such complex information, and providing effi- maintained by the system of FIG. 2; 

cient means to coherently retrieve the information is not 30 FIG. 5 is a flow diagram of a parsing scheme used for mail 

trivial, messages processed by the system of FIG. 2; 

As a disadvantage, Internet users may receive junk-mail FIG- 6 is a block diagram of a full-text index for the 

whenever they send to mailing lists or engage in news message files of FIG. 4; 

groups. There are numerous reported incidents where spe- FIG. 7 is a diagram of a labeled message; 

cific users have been overwhelmed by thousands of 35 FIG. 8 is a diagram of an address book entry; 

unwanted mail messages. Current filtering systems are inad- fig. 9 is a flow diagram for filtering queries; and 

equate to deal with this deluge. FIG. 10 is a block diagram for a Multipurpose Internet 

Known distributed systems forcomposing and-accessing^-r Mail Extensions (MIME) filter. 

«wr^arejyp^^^ DETAILED DESCRIPTION OF THE 

Mes£gijrg_^^ PREFERRED EMBODIMENT 

(POP), orSimpirM^djnansfer-Protocol (SMTPJ, Typically, Overview 

users~ must ~ mstal Pcom patible" user agent software on any y T . linn ., j-.-u.j--i 

~7 - r~ ^ — u In FIG. 1, an arrangement 100 provides a distributed mail 

client computers where the mail service is going to be ■ u • r . j- . .u ■ t 

j x!? • c . . r » . • r • service having features according to the invention. In FIG. 

accessed. Often, a significant amount of state information is ^ " .. „ , in . j ■ 

' , . ^ i . - 45 1, one or more client computers 111-113 are connected via 

maintained in the users client computers. For example, it is ^ . . -i • . „ inn a 

.j . . -, j . u c .1 a network 120 to a mad service system 200 described m 

not unusual to store the entire mail database for a particular rca t e r detail below 

user in his desk-top or lap-top computer. Normally, the users Client Computers 

explicitly organize mail messages into subject folders. ^ U|CIS m _ U3 caQ ^ works tations, ^ 
Accessing mad generally involves shipping entire messages 5o pa l m .tops, network corn- 
over the network to the client computer. puter^NCs), or any other similar configured computer 
Such systems are deficient in a number of ways. Most syst em. The clients 111-113 can be owned, borrowed, or 
computers that a user will encounter will not be configured rented It should be noted that in practice> the clients 
with user agents compatible with the user's mail service. \\i-U3 can potentially be any of the millions of personal 
Often, a user's state is captured in a specific client computer 55 computer syst e m s that are currently extant and connected to 
which means that work cannot proceed when the user moves a actw0T ^ Over time, a user may use different client 
to another computer. Managing large quantities of archival computers at different locations. 

mail messages by an explicit folder organization is difficult ^ shown for computer m t each c ij ent computer 

for most users. Accessing mail over a low bandwidth net- cxe cutes standard operating system software (O/S) 114, e.g., 

work tends to unsatisfactory. 6Q UNIX (tm^ Windows95 (™), MacOS (™) or NT (™). The 

Therefore, it is desired to provide a mail system that O/S 114 is used to eoce<^tc;application:software;programs. 

overcomes these deficiencies. One of the application programs which can execute on the 

„ rm vi client 110 is a Web browser 115. The Web browser 115 can 

SUMMARY OF THE INVENTION Netecape . Nay ; Ba ^ f , Microsofl e^,, „ ot Java> aod 

Provided is ;a : c omputer implemented method for dynami— 6Sy other similar browsers. 
caHy^en^lmg-an-adajTEs^ The functionality of the browser 115 can be extended by 

servicc-systcm-which includes a plurality of client comput- forms,rapplets|'and i plug-ins generally indicated by reference 

L ^ 
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numeral 116. In the preferred embodiment, thcrbrowser~7 
^extensions ^a^initfaeZfonn^oL^HeotrmailtapphcatioDippD- 
g rams-d escribed in greater detail below. The -client -mail 
applicationpro^apis-are.downloadcd over-me-netw pikgi20 
froffirthe^mail'service systcrrT2007 The extensions can be 5 
r implemented using Hypertext Markup Language (HTML), 
JavaScript, Java-applets, Microsoft ActiveX, or combina- 
tions thereof to provided maximum portability. & 

As shown for computer 112, the client includes one or 
more processors (P) 117, memories 118 (M)/ii5put/6utput>io 
Mteffaces:(I/0)5ll9 connected to each other by a bus 120. 
The processors 117 can implement Complex Instruction Set 
Computing (CISC) or Reduced Instruction Set Computer 
(RISC) architectures in 32, 64, or other bit length data 
structures. The memories 118 can include solid state 
dynamic random access memory (DRAM), and fixed and 
removable memories such as hard disk drives, CD-ROMs, 
diskettes, and tapes. The I/O 119 can be connected to input 
devices such as a keyboard and a mouse, and output devices 
such as a display and a printer. The I/O 119 can also be 
configured to connect to multi-media devices such as sound- 
cards, image processors, and the like. The I/O also provides 
the necessary communications links to the network 120. 
Network 

In the preferred embodiment, the network 120 includes a 
large number of public access points, and communications 
are carried out using Internet Protocols (IP). Internet proto- 
cols are widely recognized as a standard way of communi- 
cating data. Higher level protocols, such as HyperText 
Transfer Protocol (HTTP) and File Transfer Protocol (FTP), 
communicate at the application layer, while lower level 
protocols, such as Transmission Control Protocol/Internet 
Protocol (TCP/IP) operate at the transport and network 
levels. 



the private intranet. One such policy may be "never allow a 
client computer to directly connect to an intranet server via 
the public portion of the Internet." The firewall, in parts, 
protects accesses to critical resources (servers and data) of 
the intranet. 

Only certain types of data traffic are allowed to cross the 
firewall 130. Penetration of the firewall 130 is achieved by 
a tunnel 131. The tunnel 131 typically performs a secure 
challenge-and-response sequence before access is allowed. 
Once the identity of a user of a client has been authenticated, 
the communications with components of the intranet are 
performed via a proxy server, not shown, using secure 
protocols such Secure Sockets Layer (SSL) and X.509 
certificates. 
1 5 ^Mail'ServictTSylstim 

The mail service system 200 can be implemented as one 
or more server computers connected to each other either 
locally, or over large geographi es. A server com puter,_asJhe 
name implies, £iszcorjfiguretl-to-execute-server-software ; 
programs-on-behalfZofrclientzcompuLers— 111-113. 
'Sometimes, the term "server" can mean the hardware, the 
software, or both because the software programs may 
dynamically be assigned to different servers computers 
depending on load conditions. Servers typically maintain 
large centralized data repositories for many users. 

In the mail system 200, the servers are configured to 
maintain us er- acco unts, to receive, filter, and organ ize-mail3? 
.messages3o that they can readil v-bet lo cate d and retrieved^ 
no matter how the information in the messages is encoded. 
General Operation 

During operation of the arrangement 100, us crs-of-the; 
ch«u^qmpulers:llfey2:desta 

Tfiese"activities typically include : cpmposJng, / readin g, and 



20 



25 



30 



Part of the Internet includes a data exchange interface 35 ^^aniz^gx^^ges: Therefore ^ the client computers 



called the World-Wide -Web, or the "Web" for short. The 
Web provides a way for formatting, communicating, inter- 
connecting, and addressing data according to standards 
recognized by a large number of software packages. For 
example, using the Web, multi-media (text, audio, and 40 
video) data can be arranged as Web pages. The Web pages 
can be located by the browser 115 using Uniform Resource 
Locators (URLs). 

A URLs specifies the exact location of a Web-based 
resource such as a server or data record. The location can 45 
include domain, server, user, file, and record information, 
e.g., IITTP://www.digital.com/-userid/file.html/~record" 
An Internet service can be use d-to-send-and-receive-m ailg 

messages^For example, a'matf:message:can:be:scnt:mail"-to^ ^ — 
th^ddr^^jones_@_mail._digitalxom" using^he;SMTE^50 yjit-anc^herc^putee 
pro^jrATaHTdvantageTthTInteraet and "the Web allow Thcs£ characteristic 
usersTwith only minor practical limitations, to exchange 
data no matter where they are using any type of computer 
equipment. 

Intranet 55 

The mail service system 200 includes one or more server 
computers. Usually, the system 200 is part of some private 
network (intranet) connected to the public network 120. 
Typically, an intranet is a distributed computer system 
operated by some private entity for a selected user base, for 60 
example, a corporate network, a government network, or 
some commercial network. 
Firewall 

In order to provide security protection, communications 
between components of the Internet and the intranet are 65 
frequently filtered and controlled by a firewall 130. The 
purpose of the firewall 130 is to enforce security policies of 



can make connections to the network 120 using a public 
Internet service provider (ISP) such as AT&T or Earthlink. 
Alternatively, a client computer can be connected to the 
Internet at a "cyber-cafe" such as Cybersmith, or the intranet 
itself via a local area network. Many other connection 
mechanism can also be used. Once a connection has been 
made, a user can perform any mail service. 

As an advantage, structural and functional characteristics 
of the arrangement 100 include the following. Mail services 
of the system 200 are available through any Web-connected 
client computer. The users of the services can be totally 
mobile^oving among different clients at will during any of 
the mail activities. f Gom position-of-a-mail-messa geican:b6' 
started:oD;one:clien t,-completed on anoth er, a nd sent-from-a-— j 



These characteristics are attained, in part, by never lock- 
ing a user's stale in one of the client computers in case 
access is not be possible at a later time. This has the added 
benefit that a client computer's local storage does not need 
to be backed-up because none of the important data reside 
there. In essence, this is based on the notion that the 
operating platform is the Web, thus access to mail service 
system via the Web is sufficient to access user data. 

The service system will work adequately over a wide 
range of connectivity bandwidths, even for mail messages 
including data in the form of multi-media. Message retrieval 
from a large repository is done using queries of full-text 
index without require a complex classification scheme. 

The arrangement 100 is designed to incorporate redun- 
dancy techniques such as multiple access paths, and repli- 
cated files using redundant arrays of independent disks 
(RAID) technologies. 
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Mail:Sexvice:System-> 

As shown in FIG. 2, the mail service system includes the 
following components. The system 200 is constructed to 
have as a front-end a Web server 210. The server 210 can be 
the "Apache" Web server available from the WWW Con- 5 
sortium. The Web server 210 interacts with a back-end 
common gateway interface (CGI) programs 220. The pro- 
grams interface with an account manager 300, a*STMP_mail^> 
iScrver2 40, and an index server 250. The CGI programs 220 
are one possible mechanism. The programs could also be 10 
implemented by adding the code directly to the Web server 
210, or by adding extensions to the Netscape Server Appli- 
cation Programming Interface (NSAPI) from Netscape. 

The - to p-level--f unctibns-of-trieTsystem:200:include:s end-p 
maiL^4i*t receive: mail:242, quer y-inde x 243, add/remoye-J5 
iabel-to/from-mail; 244, and r retri eve-mail - 245. Different 
servers can be used for the processes which implement the 



functions 241-245. 

The account manager 300 maintains account information. 
The mail server 240 is used to send and receive mail 20 appropriate routing information in the mail server 240 for a 



last selected with a query. This way the user interface can 
position the display of messages with respect to the selected 
message when the query is reissued. User preferences 370 
specify the appearance and functioning of the user interface 
to the mail service as implemented by the extended browser 
116 of FIG. 1. Saved composition states 380 allow a user to 
compose and send a message using several different client 
computers while preparing the message. 

The account manager 300 can generate a new account, or 
deleted an existing account. The account is generated for a 
user by specifying the user name and password. Once a 
skeletal account has been generated, the user can supply the 
remaining information such as labels, named queries, filter 
queries, and so forth. 
Mail Server 

Now with continued reference to FIG. 2,^he:mail:server :: 
systeni-200-re ceives' ( 242 ) newjmaiLmessages by commu- 
nicating with the mail server 240 using the POP-3 protocol. 
M^sages^e;sent:(241):using^e:SMTJ£.prot6^ol. The mail 
server 240 is connected = to the Internet by lines 249. The 



messages to and from other servers connected to the net- 
work. The index server 250 maintains mail messages in 
message files 400, and a full-text index 500 to messages. The 
CGI programs 220 also interact with the messages files 400 
via a filter 280 for mail message retrieval. 25 

The Web server 210 can be any standard Web server that 
implements the appropriate protocols to communicate via 
the network using HTTP protocols 201, for example the 

Apache server. The CGI back-end programs 220 route ^ 

transactions between the Web server 210 and the .operational 30 fc ^^^^^^^^^^^^Z 



particular user can be generated after the user's account has 
been generated. A"POP Account Name" should be specified 
as the user's name. In most systems, the name will be case 
sensitive. The "POP Host" should be the Internet domain 
name of the mail server 240. Here, the case of the letters is 
ignored. An IP address such as "16.4.0.16" can be used, 
although the domain name is preferred. In some cases, a 
particular user's preferred Internet e-mail address may be 
unrelated to the POP Account Name, or the POP Host. 
The rapid expansion in the amount of information which 



components of the mail service system. The CGI back-end 
220 can be implemented as C and TCL programs executing 
on the servers. 
Account Manager 



locate pertinent information. The question "in which folder 
did I store that message?," becomes more difficult to answer 
if the number of messages that one would like to save 
increases over long time periods to many thousands. The 



As shown in FIG. 3, the account manager 300 maintains 35 importance and frequency of accessed messages can vary, 

account information 301-303 for users who are allowed to Traditionally, the solution has been to structure the mail 

have access to the mail system 200. Information maintained messages in a hierarchical manner, e.g., files, folders, sub- 

for each account can include: mailrhox:address-310-e:g., in folders, sub -sub-folders, etc. However, it has been recog- 

the form of a "Post Office Protocol (POP-3) address, user nized that such structures do not scale easily because filing 

password 320, label state 330, named queries 340, filter 40 strategies are not consistent over time. Many users find that 



queries 350, query position information 360, user prefer- 
ences 370, and saved composition states 380. The full 
meaning and use of the account information will be come 
apparent as other components of the system 200 are 
described. 

As an introduction, passwords 320 are used to authenti- 
cate users. Labels 330 are used to organize and retrieve mail 
messages. Labels can be likened to annotated notes that can 
be added and removed to messages over their lifetimes, in 



hierarchical structures are inadequate for substantial quan- 
tities of e-mail messages accumulated over many years. 
Particularly, since the meaning and relation of messages 
changes over time. Most systems with an explicit filing 
45 strategy require constant and tedious attention to keep the 
hierarchical ordering consistent with current needs. 

Proposed.is.an.alte rnativee-maiLmana g ement-strate gy.— t 
MesMgejRepositbry^? 

^Messages are stored in message files 400 and a full-text 



other words labels are mutable. Labels help users organize 50 index. The organization of the message files is first 



described. This is followed by a description of the full-text 
index 500. As a feature of the present invention.^u serj j 
interactioniwith :the-mail-messa g es-is^ prjmarily;by:q ueries^ | 
performed'on-the-full-text-index-SO O. 
As shown in FIG. 4, the index server 250 assigns each- 



their messages into subject areas. At any one time, the label 
state captures all labels that are active for a particular user. 
Labels will be described in greater detail below. 

In the system 200, mai l-messa ges-areraccessediby^usingzr? 
queries. This is in contrast to explicitly specifying subject 55 

folders as are used in many known mail systems^ query : is-^7 received.message-4OJ=402', a Mique ide^^c«m^(MsgID)^ 

compose d on e or- more. search -terms, perhaps connected by 410rT^MsgID '410-is-composed : of-a"file-identifjcat ioo? 

logical operatorsTthat can be used to retrieve messages. By (FilcID^ll, and a message number (MsgNum) 412. The 

specifying the name of a query, a user can easily t retrie ye? FileTD" names," or is a pointer to a specific message file 420, 

messages related to a particular topic, phrase, date, sendefi 3b and the MsgNum is some arbitrary numbering of messages 



etc. Named queries 340 are stored as part of the account 
information. 

Some queries can be designated as "filter" queries 340. 
This allows a user to screen, for example, "junk mail," 
commonly known as spam. Filter queries can also be used 
to pre -sort messages received from particular mailing lists. 
Query position information records which message the user 



65 



in a file, e.g., an index into the file 420. 

A message never changes after it has been filed. Also, the 
MsgID 410 forever identifies the same message, and is the 
only ID for the message. In the referenced message file 240, 
a message entry 430 includes the MsgNum stored at field 
431, labels 432, and the content of the message itself in field 
433. 
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The number of separate tiles 240 that are maintained for 
storing messages can depend on the design of the underlying 
file system and specific implementation details. For 
example, the size and number of entries of a particular file 
may be limited by the file system. Also, having multiple files 
may facilitate file maintenance functions such as back-up 
and restore. 
Label Log 

Although a message may never change, the set of labels 
associated with a message may change. Because labels can 
change, a transaction log 440 is also maintained. The log 440 
includes "add" entries (+label) 450, and "remove" entries 
(-label) 460. Each entry includes the MsgID 451 or 453 of 
the effected message entry, and label that is being added 
(452) or deleted (453). The contents of the log 440 are 
occasionally merged with the message files 240. Merged 
entries are removed from the log 440. The label log 440 
allows for the mutation of labels attached to data records 
such as mail messages, where the labels and the data which 
are labeled are stored in the same index. 
Full-Text Index 

FIGS. 5 and 6 show how the index server 250 generates 
the full-text index 500. Newly received mail messages are 
processed in batches 403-404. Messages 401 and 402 of a 
batch are parsed into individual words 510. A batch 403 in 
a large mail service system may include hundreds or thou- 
sands of messages. The words of the messages are parsed in 
the order that they are received in a batch. Each word is 
arbitrarily assigned a sequential location number 520. 
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For example, the very first word of the very first message 30 field. 



in some indexed message. The term "word" is used very 
loosely here, since the parsing of the words in practice 
depends on which marks/characters are used as word sepa- 
rators. Words do not need to be real words that can be found 
in a dictionary. Separators can be spacing and punctuation 
marks. 

The indexer 250 will parse anything in a message that can 
be identified as a distinct set of characters delineated by 
word separators. Dates are also parsed and placed in the 
index. Dates are indexed so that searches on date ranges are 
possible. In an active index there may well be millions of 
different words. Therefore, in actual practice, compression 
techniques are extensively used to keep the files to a 
reasonably size, and allow updating of the index 500 as it is 
being used. The details of the physical on-disk structure of 
the index 600, and the maintenance thereof are described in 
U.S. Pat. No. 5,745,899, entitled "A Method for Indexing 
Information of a Database", issued to Michael Burrows on 
Apr. 28, 1998, incorporated in its entirety herein by refer- 
ence. 

The word entries 610 are stored in the collating order of 
the words. The word is stored in a word field 611 of the entry 
610. The word field 611 is followed by location fields (Iocs) 
612. There is one location field 612 for every occurrence of 
the word 611. As described in the Burrows reference, the 
locations are actually stored as a sequence of delta-values to 
reduce storage. The index 600 is fully populated. This means 
the last byte 614 of the last location field of a word is 
immediately followed by the first byte 615 of the next word 
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of the very first batch is assigned location "1," the next word 
location "2," and the last word location "3." The first word 
of the next message is assigned the next sequential location 
"4," and so forth. Once a location has been assigned to a 
word, the assignment never changes. If the location is 
expressed as a 64 bit number, then it is extremely unlikely 
that there will ever be an overlap on locations. 

As the messages are parsed, the indexing process gener- 
ates additional "metawords" 530. For example, an end-of- 
message (eom) metaword is generated for the last word of 40 
each message. The metawords are assigned the same loca- 
tions as the words which triggered their generation. In the 
example shown, the location of the first eom metaword is 
"3," and the second is "5." 

Other parts of the message, such as the "3o" "Frames 45 
"Subject," and "Date^' fields may.generate other distinctive 
metawords to help organize fhe~full-text index 500. Meta- 
words help facilitate searches of the index. Metawords are 
appended with predetermined characters so that there is no 
chance that a metaword will ever be confused with an actual 
parsed word. For example, metawords include characters 
such as "space" which are never allowed in words. 
Hereinafter, the term "words" means both actual words and 
synthesized metawords. 

After a batch of messages have been parsed, the words 
and their assigned locations are sorted 540, first according to 
the collating order of the words, and second according their 
sequential locations. For example, the word "me" appears at 
locations "3" and "5" as shown in box 550. The sorted batch 
550 of words and locations is used to generate the index. 
Each sorted batch 550 is merged into the index 500, initially 
empty. 

Index Structure 

FIG. 6 shows the logical structure of an index 600 
according to the preferred embodiment. The index includes 
a plurality of word entries 610. Each word entry 610 is 
associated with a unique "word," that appeared at least once 
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Labels 

Labels provide a way for users to annotate mail messages. 
Attaching a label to a message is similar to affixing a note 
to a printed document. Labels can be used to replace the 
folder mechanisms used by many prior art mail systems. 
However, a single mail message can be annotated with 
multiple labels. This compares favorably to folder-based 
systems where a message can only be stored in a single 
folder. 

Users can define a set of labels with which to work. The 
labels are nothing more than predefined text strings. The 
currently active set of labels for a particular user, e.g. the 
label state 330 of FIG. 3, is matota^e^^yj^t he-accoun t 
manag er-300 an d is:displayed:mXwindow-of4he-graph"icai 
user-interface^ Labels can be added and removed by the 
system or by users. 

As shown in FIG. 6, labels are stored in a data structure 
650 that parallels and extends the functionality of full-text 
index 500. Labels are subject to the same constraints as 
index words. Also, queries on the full-text index 500 can 
contain labels, as well as words, as search terms. A label is 
added to a mail message by adding a specific index location 
(or locations) within the message to the set of locations 
referred to by the specified label. Label removal is the 
opposite. Operations on labels are much more efficient than 
other operations that mutate the state of the full-text index. 

The on-disk data structure for the label index 650 that 
represents the label state 320 is the same as that described 
for index word entries 600. This means that the label state 
can be thought of as an extension of the full-text index 500. 
Accordingly, the label index extension, like the index 500, 
maps labels (words) 651 to sequences of index locations 
652. 

Although the structural formats of the label extension 650 
and the full-text index 500 are the same, for efficiency 
reasons, the label portion of the index is managed by a 
software component that is distinct from the software that 
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manages the full-text index 500. If a term of a query string 
is found to be a label, then the label index 650 is searched 
to provide the necessary location mapping. This mapping is 
further modified by the label log 440 that contains all recent 
label mutations (additions or removals). The label log 440 
can include an in-memory version 660. Since operations on 
this structure are in-memory, updates for recent label muta- 
tions 660 can be relatively fast while the updating of the 
label index 650 can take place in background. 



"deleted" label, e.g., "and not deleted." This has the effect of 
hiding all deleted messages from the user of the client. There 
is an option in the user interface which inhibits this effect to 
make deleted messages visible. 
Named Queries 

Queries can be "named." Named queries are maintained 
by the account manager 300. By specifying the name of a 
query, users can quickly perform a search for e-mail mes- 
sages including frequently used terms. Users can compose *£*=£^ 



As shown in FIG. 7ra messag e -700 incl udes- a;header:70l7iQ complex queries to match on some pattern in indexed 



an d-a-bod y ? TJre:headerI701_typicaUy-inchides -the-iHo -,"> 
"From"j "Date'iTand "SubjecClfieias. Th£tie^er-may-alscP 
include. ro uting-informatio n. The ;body„7.02:isrm"^texl"of-tE&? 
maiFme ssager -J 

Each mail message can initially receive two labels, 
"inbox" 710 and "unread" 720. Messages labeled as 
"unread" 720 have not yet been exposed for reading. Mes- 
sages with the "inbox" label 710 are deemed to require the 
user's attention. As will be described below, it possible for 
messages to be labeled as unread but not have the inbox 
label. These less important messages can be read by the user 
as needed. 

c^^P^B&TZ^Srt displaying 7 or ( pjintmg£>a message 
removes the unread label 720 under the assumption that it 



messages, perhaps intermixing conditions about messages 
having particular text or labels, and to keep the query for 
subsequent use. 

Named queries can be viewed as a way for replacing prior 
15 art subject folders. Instead of statically organizing messages 
into folders according to predetermined conditions, queries 
allow the user to retrieve a specific collection of messages 
depending on a current set of search terms. In other words, 
the conditions which define the collection are dynamically 
20 expressed as a query. 
'History^Eist"^ 

f Recendy-per jormed-q ueric s-are-ke p t-m-a— hjs tory^rlistp 
Accordingly, frequently performed queries can readily be 
re-issued, for example, when-mejndex:has:been:changed:? 



has been read. A user can explicitly add or remove the 25 because:of;newly;received-mailt'or becauseiof actions.takeri> 



unread label. A message can be deleted by attaching a 
"delete" label 730. This has the effect that the message will 
not been seen again because messages labeled as deleted are 
normally excluded during searches. Removing the deleted 
label has the effect of "un-deleting" a message. 

Although a preferred embodiment uses labels for data 
records that are mail messages, it should be understood that 
"mutable" labels can also be used for other types of data 
records. For example, labels which can be added and 
removed can be used with data records such as Web-pages, 
or news group notes. The key feature here being that labels 
are indexed in the same index as the record which they label, 
and thatjabels can be added and removed. 
Queries— 
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b y-other-d ignt:compute'rs. 
d>ynamic;Address:Bookrr? 

Queries can-alsobe-used to T^rfonD;the:mnction-of.prior^7 
a rt-"ad dress:books7 In many known e-mail systems, users 
keep address books of frequently used addresses. From time 
to time, users can add and remove addresses. There, the 
address books are statically maintained as separate data 
structures or address book files. For example, there can be 
"personal" and "public" related address books. In contrast, 
here, there is no separately stored address book. I astead ra n — ■> _ 
"^ address book" is;d ynaiiiicaJlyrgeneratea^aTit"is needed. TlierzyS] 
dy namic address book is g enerated from-the files 400 a nd;then? I 1 
full-text.index-SO Oas-followsr ^ <? — '3 
Asshown in FIG. 8, a-userof:a:clien t;com puter820;caii? 



t er e-mail-mes sages have bjex;indexed:ana^labele^,-tte^40 generate.addres^boojt-typejn 800 



mes^ag es-can-be-retri eye'dlblCis^i^ , 
^qu^ warches^fOTme^age^that ;match7on.wordsand'labeIs-' 
spec^e^:i5uh"e.query. This is in contrast with known mail 
systems where users access mail by remembering in which 



suppliedjV one^fjhe^client'mailap plication programs-Uk 
The form 800"includes, for example, entry fields 801-803 for 
address related information such as name, phone number, 
n (hard-copy) jgail ajdjgss, and (soft-copy^ e-mail-address^ 



file, folder, or sub-folder messages have been placed so the 45 \ and so forth. Alternatively, < al3drcssrirtfoTmation TcanZbe?' ~~| I 
folder can be searched. As an ^advanUge^of^the: present / i selected from-a priorre ceived jmail Wssage . 805,by^licking-7 1^. 
svstem.-use^onlv need:to:recaU-some-w^ J i 

find.matching.messages^ [ C805;j/ £Tp^ 
The syntax of the query language is similar as described 



„j ^ } o o From the perspective of the mail service system 200 and ft 

in the Burrows reference. A query includes a sequence of 50 the index server 250, the-add rosslbo'ok 3irif6rniatiQnIis3? | 



handle d exactl y, as a received mail message^This means that, 
for example, the data.of the fields 801-803 are combined 
into an "address book" mail message 810. A n "addrga sT 



primitive query terms, combined by operators such as "and, ! 
"or," "not," "near," and so forth. A primitive term can be a 

sequence of alpha-numeric characters, i.e., a "word," with- r ;-— - .. ... , — . . 

out punctuation marks. If the 'terms-are-enclosed-withoutT? yabel 809 can also be added to the entry .using the labeling 
quotationjiiarks ("), trie-search" &for anexact matcrr o nthe -55' convention as described herein. The address book mail 
quoted^^sUmg.v'A term can be a label. c A-tenn 7Such-as^ Jmessage 810 and label 809 can be stored in one ot tne" 
-'^£romrfred^searches-for-messages. with:t he~ word - "f red"' in? message^files -400. Additionally th e message 810 can be 
t he"from" field^f TmessageTneadcr.^Sim parsed and inserted into the full-text index 500 as are the 



f ormula ted-for the "to;"-"fromr"ccr-and-"sub j ect~ficlds of- 
(- the~headerr~ ~7 60 
A term such as "Nov. 2, 1996-Dec. 25, 1996" searches for 
all messages in the specified date range. The parsing of dates 
is flexible, e.g., Dec. 25, 1996, Dec. 25, 1996, and Dec. 25, 
1996 all mean the same date. In the case of ambiguity (Feb. 
1, 1996) the European order (day/month) is assumed. 65 

During normal operation, the CGI program 220 modifies 
each issued query by appending a term which excludes the 



•words and labels of any other mail message. In other words, 
the address information of form 800 is merged and blended 
with the full-text index 500. 

After the address information has been filed and indexed, 
the^address:informadon;can:b«;retrievcd:by-the-user-of:the> 
client „computer_820-xo^posjng^a;queryr830 -uSing rthe— 
stan^r^querx.ihter£ace, with perhaps, the label "address'> 
t asTonej^fj^hej]ujejry3ernis. The exact content to be retrieved 
is^etermined"arthe _ time that the terms and operator of the 
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^quej. v-830-are-composed-b y-jhe^rnser. The- address^ 
, — information; i.e. l t ppe or more address-book mai l'messages, 
which satisfies the query is returned to the client computer 
82 0-as-mc-d ynarnic;address:boi>k 840. The user can theo 
select one.of .the addresscs- as a-"to" address for a n e,w,-replyr> 5 
or; forward .maU.rhessageT 
Message Resemblance 
4t:^also- possibj ejpjse archrf or-mess ages which-re sembte 
cancunrently -se lected -mes sage. In this case a document 
resemblance technique can be used. Such a technique is 



M Message Display Options 

From the user's perspective, access Jo the mail-service s is, 
iJimplementedJ^xten^^ 

Qappjcjs. Messages are-normallyrdisphyed-by-their p rmja ryp 
^mponentJDemg-transrmtte^^ 

f6matrand"bemg"displayed"in"me~Java~aTn ilet J s*wmo^ w. 
The first line of a displayed message contains any "hot- 
links" which the user can click to display the message in one 
of the Web browser's windows, either with the HTML 
formatting, or as the original text uninterpreted by the 




described in U.S. patent application Ser. Nor08/665;709p S ^g ould be noted,, headers inlntemetmessages. depend-^ 
entitled A Method for Determming the Resemblance of * — ~ 1 - - 

Documents", filed by Broder et al. on Jun. 18, 1996, 



incorporate in its entirety herein by reference. This allows a 
user to find all messages which closely relate to each other. 15 
Sorting Search Results 

When a search for an issued query completes, the results 
of the search are presented in an order according to their 
MessagelD 411, FIG. 4. In practice, this means that quali- 
fying messages are presented in the temporal order of when 20 
the messages were received. 

Most prior art e-mail systems allow other sort orders, such 
as by sender, or by message thread (a sequence of related 
messages). There is no need for such capabilities here. 
Consider the following possibilities. 25 

MessaK es~from-a-particul ar.zu ser-caD-be-specified-b y^ 
C in cludin g ;jnja-quejry-aj:te This will 

locate only messages from a particular user. You can select 
messages of a particular "thread" by using the "view-dis--^ 



ing on routing, can be quite lengthy. Therefore~it; is ; possible> 
^torestrict the viewlo justthe^from^j^toj? ^cc/^date,"^ndj> 
'fsubject^fields.of;thTheaderi? c ~~^ 
Embedded Links UT\ 

Whenidisplaymg^retrie^d^messages, the-systemr200n? 
heuristicall y-lo<ates-textrj5tdD gs-which-hav e-the-syntax-of^ 
e-mail;addresses. If the user click o n one of-the seaddresscs, 
then-the-system~will~displayZaZcom position~windowr 
described below,- s o that thcuser can eas ily;generate:a reply> 
messag e-to-the"se lected:ermail:address(es)rzZ? 

Similarly, when displaying retrieved messages, the sys- 
tem 200 heuristically locates text strings that have the syntax 
of an URL, and makes the string a hot-link. When the user 
clicks on the hot-link, the URL is passed to the browser, 
which will retrieve the contents over the network, and 
process the content in the normal manner. 

The system also attempts to detect components in 
messages, such as explicitly "attached" or implicitly 



CTSsion""opUon-of~the-user4nterfac£-described:below. As 30 "embedded" files. The files can be in any number of possible 
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stated above, c messages-for^a.particu lar_date-range-can-bo^ 
f -s^ecified:in:theTquery. — ^ 
Filtering Messages 

In order to-facilitatermail handling, particularly for some- 
one receiving a large amo untofx-m ail, a user can configure 
the filter 280 to his or her own preferences as shown in FIG. 
9. A message filter is specified as one or more name "filter" 
queries 910. The named query 910 is stored as part of the 
account information of FIG. 3. The named filter query 910 
can be composed on a client computer 920 using the client 
mail application programs downloaded from the mail ser- 
vice system 200. 

New~mes sag es-930-received*b y:the:mail:service:sysiem 
200;are;storcd, parsed, and mdcxcd.m.ihejricssage:files~400 
and full-text indcx"500 as descritedlib~ove7Tn addition, each 
new message 930 can be compared with the named queries 
9 10/If the content of. a, new, message .93 0 does not: match/any 
of-the-named-filter-q uerie s'910,""then the new message 930 
is~given the inbox label 710 and the unread label 720, i.e., 
the message is placed in the "In-box" 940 for the user's 
attention. Otherwise, the new message 920 is only given the 
unread label 720. 

For example, mail which is sent outjypjcallyjhas a'"from^' 
field-including jhe-name^oHhe: sender .Teig^ErornTJon— 
<Doe;""th~the"message:teader. Alternatively, the body of the 
mail message may include the text, "You are getting this 
message from your good friend Jon Doe." The user Jon Doe 
can set up a named filter query "SentByME" as "From near 
(Jon Doe)". This query will match any message which 
contains the word "from" near the word phrase "Jon Doe." 
The effect is that users do not explicitly become aware of 
messages that match on the filter query 910. For example, a 
user may want to filter messages which are "cc" copies to 
one self. A user may also desire to filter out junk e-mail 
messages arriving from commercial e-mail distributors at 
known domains, or pre-sortjnessages received via:mailing 
lists. 
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formats. The content of these files are displayed by the 
browser 115. The specific display actions used will depend 
on how the browser is configured to respond to different 
component file formats. 

For some file formats, for example Graphics Interface 
Format (GIF) and Joint Photographic Experts Group (JPEG) 
the component can directly be displayed. It is also possible 
to configure the browser with a "helper" applet to "display" 
attached files having specific format types as "icons." For 
example, the message may be in the form of an audio 
message, in which case, the message needs to be "said," and 
not displayed. For some message formats, the browser may 
store some of the content in file system of the client 
computer. 

Low-Bandwidth Filtering 

Since the client computers 111-113 may access the mail 
service system via low-bandwidth network connections, an 
attempt is made to minimize the amount of data that are sent 
from the mail service system to the client computers. Even 
over high-speed communications channels, minimizing the 
amount of network traffic can improve user interactions. 

Because the mail service system 200 allows mail mes- 
sages to include attached or embedded multi-media files, 
mail messages can become quite large. In the prior art, the 
entire mail message, included files are typically shipped to 
the client computer. Thus, any part of the mail message can 
immediately be read by the user after the message has been 
received in the client. 

As shown in FIG. 10, the mail service system 200 can 
recognize messages components that are included as such. 
The system 200 can discover an explicitly attached file 1010 
to a message 1000, and the system 200 can also heuristically 
discover textual components 1021-1021 that are implicitly 
embedded without MIME structuring in the message. For 
example, the system 200 can recognize embedded "uuen- 
coded" enclosures, base 64 enclosures, Postscript (and PDF) 
documents HTML pages, and MIME fragments. 
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Accordingly, the system 200 is configured to "hold-back" 
such components 1010, 1020-1021 encoded in different 
formats using a "MIME" filter 1001. The attached and 
embedded components arc replaced by hot-links 1031 in a 
reduced size message 1030. Only when the user clicks on 5 
one of the hot-links 1031 is the components sent to the 
requesting client computer. 
Client Computer User Interface 

The following sections described how the Web browser 
115 is configured to provided the e-mail services of the io 
system 200. The functions described can be displayed as 
pull-down menus, or as button bars depending on a desired 
appearance. Preferably, the functions are implemented as 
Java applets. 

File Menu 35 

The file menu has the following options, Administration, 
Preferences, and Quit. If the user clicks on the Administra- 
tion option button, then the system 200 loads the system 
administrative page into the browser 116. Using the Admin- 
istrative window, subject to access controls, the user can 20 
view and modify accounts, and view the server log files. The 
preferences option is used to modify user preferences 370. 
Quit returns to the main log-in window. 
Queries Menu 

This menu includes the View Discussion, Name Current 25 
Query, Forget Named Query, Exclude "deleted" Message, 
and Your Query Options. The View Discussion option issues 
a query for messages related to the currently selected mes- 
sage. Here, "related" means any messages which share 
approximately the same subject line, and/or being in reply to 30 
such a message, or messages linked by a common standard 
"RFC822" message ID. 

The Name Current Query allows a user to attach a text 
string to the current query. This causes the system 200 to 
place the query in the account for the user for subsequent 35 
use. The Forget Named Query option deletes a named query. 

The Excluded "deleted" message option omits from a 
query result all messages that have the deleted label. This is 
the default option. Clicking on this option changes the 
behavior of the system 200 to include, in response to a query, 40 
"deleted" messages. The Your Named Queries option dis- 
plays a particular user's set of named queries 340. Clicking 
on any of the displayed names issues the query. 
Labels Menu 

This menu includes the Record Label, and Forget Label 45 
options. These option respectively allow for the addition and 
removal of labels to and from the label state 330. 
History Menu 

The client keeps a history of, for example, the last ten 
queries to allow for the reissue of queries. The options of this 50 
menu are Go Back, Redo Current Query, Go Forward, and 
The History List. Go Back reissues the query preceding to 
the current query. Redo reissues the current query. This 
option is useful to process messages which have recently 
arrived, or in the case where the user's actions have altered 55 
the messages files 400 in some other manner. Go Forward 
reissues the query following the current query. The History 
List displays all of the recently issued queries. Any query 
listed can be reissued by clicking on the query. 
Messages Menu 60 

Options here include: Select All, Select Unread, Select 
Read, Mark As Unread, Mark As Read, Add Labels, Remove 
Labels, and Use Built-in Viewer. The Select All option 
selects all messages which match the current query. The next 
two options respectively select message that do not, and do 65 
have the unread label. The following two options add and 
remove labels label to currently selected messages. 
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The user interface normally displays a message by con- 
verting the message to an HTML format and presenting it to 
an HTML viewer which can either be in the browser's main 
window, or with a built-in viewer. The last option of the 
message menu selects the viewer. 
Help Menu 

The help options can be used to display informational 
pages on how to use the various features of the system. The 
help pages are down- loaded on demand into the client 
computer from the mail service system 200. 
Main Window Menu Bar 

This menu bar includes buttons for thefollowing func- 
tions. The functions are enabled by clicking on the button. 
Add: This button is used to add a selected label to a message. 
Relabel: This button combines the functions of the unlabel 

and add functions. 
Delete: With this button, a deleted label is added to a 

message. 

Unlabel: Used to remove a single label mentioned in a query 

from a message. 
Next: Selects a next message. 
Prev: Selects a preceding message. 
Newmail: Issues a query for all message having the inbox 

label. 

Query: Presents a dialog to compose and issue a query. 
Message Display Button Bar 

This button bar is used to perform the following functions. 
Detach: Generate a new top-level window to display 

selected messages. 
Compose: Generate a window for composing new mail 

messages. 

Forward: This function sets up a window for composing a 
new message. A selected message is attached to the new 
message. The attached messages are forwarded without 
the need of down -loading the messages to the client 
computer. 

Reply To All: This function sets up a window for composing 

a new message with the same recipients as those in a 

selected message. 
Reply To Sender: Set up a window for composing a new 

message to the sender of a selected message. 
Composition Window 

Access to the composition window is gained by clicking 
on the Compose, Forward, Reply, or Modify button, or by 
clicking on a "mail-to" hot link in a displayed message. 
Compose begins a new message, forward is used to send a 
previously received message to someone else, reply is to 
respond to a message, and modify allows on to change a 
message which has not yet been sent. The mail service 
allows a user to compose multiple messages at a time. 

The text of a message is typed in using an available 
composition window, or generating a window if none are 
available. The exact form of the typing area of the compo- 
sition window depends on the nature of the windowing 
system used on a particular client computer. Typically, while 
typing the user can use short-cuts for editing actions such as 
cut, paste, copy, delete, undo, and so forth. 

Text- porU^ns-fromano t her-me ssage : cjp-be:inserted:by- 
usmg-me-Iriscrj :: MsEror2Qjiote-Msg ~buttops.-- If an entire 
message~is"to"be included, then the Forward button should 
be used. The message will not actually be posted until the 
send function is selected. While the message is being 
composed, it is periodically saved by the mail system. Thus, 
a composition session started using one client computer in 
an office, can easily be completed some time later using 
another computer. 

Send: Sends a message. Any attachments are included 
before sending the message. The user is notified of invalid 
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recipients by a status message, and editing of the message 

can continue. Otherwise, the window is switched to 

read-only mode. 
Close: After a message has been sent, or the discard button 

is clicked, this button replaces the send button to allow 

one to close the composition window. 
Discard: This button is used to discard the message being 

composed, and switches the window to read-only. A user 

can then click the close or modify buttons. 
Modify: After a message has been successfully sent, or if the 

discard button has been clicked, this button appears in 

place of the discard button to allow the user to compose 

another message derived from the current message. 
Wrap: Thus function is used to limit the number of characters 

on any one line to eighty, as required by some mailing 15 

systems. 

Insert Msg: Replace the selected text with displayed text 

from a selected message. 
Quote Msg: Replace the selected text with displayed text 

from a selected message so that each line is preceded by 

the ">" character. 

Having described a preferred embodiment of the 
invention, it will now become apparent to one skilled in the 
art that other embodiments incorporating its concepts may 
be used. It is felt therefore, that this embodiment should not 
be limited to the disclosed embodiment, but rather should be 
limited only by the spirit and the scope of the appended 
claims. 

We claim: 

1. A computer implemented method for dynamically 
generating an address book in a distributed mail system, the 
distributed mail system including a plurality of client com- 
puters connected to a mail service system via a network, 
comprising: 
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storing mail messages in message files of the mail service 
system; 

parsing and indexing each mail message to generate a 

full-text index of the mail messages in a memory of the 

mail service system; 
generating address book mail messages, each address 

book mail message including address information; 
storing the address book mail messages in the message 

files; 

parsing and indexing the address book mail messages into 
the full-text index file; and 

composing a query using a particular one of the plurality 
of client computer systems to search the full-text index 
to locate and retrieve selected ones of the address book 
mail messages as the dynamic address book. 

2. The method of claim 1 wherein the address information 
is generated using a form supplied by client mail application 
programs executing on the particular client computer. 

3. The method of claim 2 wherein the client mail appli- 
cation programs are down-loaded to the particular client 
computer from the mail service system. 

4. The method of 1 wherein the address information is 
selected from the stored mail messages. 

5. The method of claim 1 wherein a particular address 
book mail message includes an address label, and further 
including: 

storing the address label with the particular address book 
mail message, storing the label in the full- text index, 
and specifying the address label in the query. 




09/04/2003, EAST Version: 1.04.0000 



